feat(support): add support service with WebSockets and Yamux#47
feat(support): add support service with WebSockets and Yamux#47edospadoni wants to merge 28 commits intomainfrom
Conversation
|
🔗 Redirect URIs Added to Logto The following redirect URIs have been automatically added to the Logto application configuration: Redirect URIs:
Post-logout redirect URIs:
These will be automatically removed when the PR is closed or merged. |
🤖 My API structural change detectedStructural change detailsAdded (14)
Modified (5)
Powered by Bump.sh |
c62b877 to
007bd6d
Compare
tunnel-client binary (linux/amd64)Download: Quick start# Make it executable
chmod +x tunnel-client-linux-amd64
# Run it
./tunnel-client-linux-amd64 \
--url wss://my-proxy-qa-pr-47.onrender.com/support/api/tunnel \
--key <SYSTEM_KEY> \
--secret <SYSTEM_SECRET>Parameters
Service discovery modesThe tunnel-client auto-detects the environment:
Diagnostics plugin systemAt connect time, the tunnel-client collects a health snapshot and sends it to MY over the tunnel. Operators see the results directly in the support session popover — before opening a terminal or proxy — so they have immediate context on the system state. How it works:
Built-in plugin (
External plugins: any executable file placed in Each plugin must:
#!/bin/bash
# /usr/share/my/diagnostics.d/10-myservice.sh
STATUS="ok"
SUMMARY="all good"
if ! systemctl is-active --quiet myservice; then
STATUS="critical"
SUMMARY="myservice is not running"
fi
echo "{\"id\":\"myservice\",\"name\":\"My Service\",\"status\":\"$STATUS\",\"summary\":\"$SUMMARY\"}"
exit $([ "$STATUS" = "ok" ] && echo 0 || echo 2)The overall session status shown in MY is the worst status across all plugins (critical > warning > ok). If a plugin exceeds its timeout it is marked If Environment variablesAll flags can also be passed as env vars: export SUPPORT_URL=wss://my-proxy-qa-pr-47.onrender.com/support/api/tunnel
export SYSTEM_KEY=<your-key>
export SYSTEM_SECRET=<your-secret>
./tunnel-client-linux-amd64 |
Show a clickable headset icon next to system name when an active support session exists. The popover displays session status, dates, and connected operators with per-node terminal badges. Backend now tracks terminal disconnect times via access log lifecycle (insert returns ID, disconnect updates disconnected_at).
…able rate limits Refactor the tunnel-client from a single 1181-line main.go into organized internal packages (config, connection, discovery, models, stream, terminal). Rename traefik.go to nethserver.go with updated function names and log messages. Replace YAML config with EXCLUDE_PATTERNS env var / --exclude flag for service filtering. Improve api-cli error logging to include stderr output. Add configurable rate limiting via env vars (RATE_LIMIT_TUNNEL_PER_IP, RATE_LIMIT_TUNNEL_PER_KEY, RATE_LIMIT_SESSION_PER_ID, RATE_LIMIT_WINDOW) with session limit raised from 100 to 500 req/min. Add build-tunnel-client and run-tunnel-client Makefile targets.
Shift migrations to avoid conflict with 017_inventory_fk_set_null added on main.
f56f203 to
2fbfa44
Compare
At connect time, the tunnel-client collects a health report and pushes it to the support service over a dedicated yamux stream. Operators see the results in the session popover before opening a terminal or proxy. Built-in system plugin always runs (CPU load, RAM, disk, uptime, OS info). External plugins can be dropped as executables in /usr/share/my/diagnostics.d/ - NS8 modules and NethSecurity can ship their own health checks independently. Each plugin writes JSON to stdout and signals severity via exit code (0=ok, 1=warning, 2=critical). The overall session status is the worst status across all plugins. Diagnostics run in parallel with the WebSocket connection to avoid adding latency. A per-plugin timeout (default 10s) and a total timeout (default 30s) prevent slow plugins from blocking the session. - tunnel-client: new internal/diagnostics package (runner + models), built-in system check, DIAGNOSTICS yamux stream after manifest - support service: acceptControlStream distinguishes DIAGNOSTICS header from manifest JSON, SaveDiagnostics() stores JSONB on session - backend: GET /api/support-sessions/:id/diagnostics with RBAC scoping, migration 021 adds diagnostics + diagnostics_at columns - frontend: diagnostics section in SupportSessionPopover with status dot and per-plugin summary rows
Operators can now inject arbitrary host:port services into a running tunnel session without reconnection, enabling access to LAN devices (IP phones, switches) through the support proxy. - Backend: POST /support-sessions/:id/services with RBAC, validation, and Redis pub/sub dispatch (add_services action) - Support service: SendCommandToSession() opens outbound yamux stream, writes COMMAND 1\n + JSON payload, waits for OK/ERROR - Tunnel-client: accept loop pre-reads first line to route COMMAND vs CONNECT streams; thread-safe serviceStore with sync.RWMutex - Frontend: Add Service modal with name/target/label/TLS fields; 1500ms delay before re-fetching services to account for async round-trip - OpenAPI: documented new endpoint with Conflict response component - README: added COMMAND stream table, Static Service Injection section
Fixes 10 security issues identified in the pen-test review of the static service injection and diagnostics features: - SSRF bypass in applyAddServices (HMAC-signed Redis commands, server pre-check, and client-side validateTarget) - Diagnostics JSON schema validation, 512 KB size cap, and DB-enforced rate limit across reconnections - Diagnostic plugins rejected if not owned by root or writable by others; sanitized environment strips credentials - host:port validation uses net.SplitHostPort with numeric range check - DIAGNOSTICS stream version validated as exact "DIAGNOSTICS 1" - serviceStore total cap (500) prevents unbounded growth - Diagnostics goroutine starts only after yamux session is established
Remote apps (NethVoice, NethCTI) proxied through different subdomains
make cross-origin API calls that require CORS headers and shared cookie
authentication across sibling subdomains of the same support session.
Backend:
- Move CORS middleware from router to /api group so it does not
intercept /support-proxy/* routes
- Add CORS preflight (OPTIONS 204) and response headers for
same-session sibling subdomains (validated by session slug match)
- Scope proxy cookie to .support.{domain} with SameSite=Lax so it
is shared across all service subdomains of the same session
- Remove per-service token validation: session ID match is sufficient
since users have session-level access
Support service:
- Fix non-deterministic hostname rewriting in buildHostRewriteMap:
when multiple services share the same original hostname, the current
service's proxy subdomain is always preferred, keeping API calls
same-origin and letting Traefik handle path-based routing
d683765 to
50624ac
Compare
…er display Add GET /api/support-sessions/diagnostics?system_id=X endpoint that returns diagnostics for all active sessions of a system grouped by node, with an overall_status reflecting the worst across all nodes. Update the frontend popover to show collapsible per-node sections for multi-node NS8 clusters while keeping the flat list for single-node systems.
Tunnel-client creates temporary users when a session starts and removes them when it ends, giving operators access to remote admin interfaces without requiring customer credentials. NS8: creates cluster-admin (Redis) + domain users per local LDAP/Samba provider. Worker nodes fetch credentials from the leader via USERS_FETCH yamux stream. NethSecurity: creates local admin user via nethsec Python module. Plugin system (users.d/): executable scripts configure applications for the support user. The tunnel-client passes --instances-file with module context (instances, domains, services) so plugins can configure per- instance credentials. Frontend: unified Services & Credentials modal replaces the old service dropdown, showing cluster admin, domain credentials per module accordion, and clickable service links.
- Fix re-discovery overwriting injected services: the serviceStore now tracks COMMAND-injected services separately and preserves them when periodic re-discovery replaces discovered services. - Add remove_services COMMAND: tunnel-client removes injected services from its store and re-sends the manifest. The support service also removes them server-side immediately for instant API consistency. - Add DELETE /api/support-sessions/:id/services/:name endpoint to remove custom services via the frontend. - Rename "Other Services" to "Custom Services" in the frontend with a delete button (trash icon) for each custom service. - Frontend: re-fetch services on modal open for fresh data.
- Make INTERNAL_SECRET mandatory at startup (fail-fast), remove fallback that accepted unsigned Redis commands and unauthenticated internal requests when secret was empty - Add RBAC scope verification to CloseSupportSession and ExtendSupportSession to prevent cross-tenant session manipulation - Clear ephemeral credentials (users JSONB) from database on session close, expire, and replace to limit credential exposure window - Add HTTP server timeouts (ReadHeaderTimeout, IdleTimeout) to prevent slowloris denial-of-service attacks - Re-validate service target DNS at proxy connection time to prevent TOCTOU DNS rebinding attacks (previously only validated at manifest registration) - Move plugin temp files from /tmp to /var/run/my-support-tmp/ with 0700 permissions to prevent inotify-based credential snooping - Add PostgreSQL advisory lock on session creation to prevent race conditions when two tunnel-clients connect simultaneously
Plugin Systems:
|
| Check | Rule |
|---|---|
| File type | Must be a regular file (no symlinks, no directories) |
| Executable | Must have at least one execute bit set (0o111) |
| Ownership | Must be owned by root (UID 0) or the tunnel-client process UID |
| Write permissions | Must not be group-writable or world-writable (0o022 mask) |
| Environment | Plugins run with a minimal environment (PATH only) — no inherited secrets |
| Timeout | Per-plugin timeout enforced via context.WithTimeout |
| Output limit | Stdout capped (512 KB for diagnostics, 64 KB for users) |
| Temp files | Credential files (--users-file, --instances-file) are written to /var/run/my-support-tmp/ (0700) and deleted after execution |
If any check fails, the plugin is silently skipped with a log message.
Plugin Naming Convention (users.d only)
The plugin filename determines module matching:
- If a plugin is named
nethvoice, the tunnel-client checks if any discovered NS8 module has base namenethvoice(stripping trailing digits:nethvoice103→nethvoice) - If a match is found,
--instances-fileis passed with all matching instances, their domains, labels, node IDs, and service routes - If no match is found, the plugin runs without
--instances-file(useful for generic plugins that don't need module context)
This enables a single nethvoice plugin to configure all NethVoice instances on the cluster.
Example: Adding a New users.d Plugin
To add support user configuration for a new NS8 module (e.g., webtop):
- Create
/usr/share/my/users.d/webtop(executable, owned by root, mode 0755) - Handle
setupandteardownactions - Read credentials from
--users-file(domain user matching the instance's domain) - Read instance context from
--instances-file(module instances, services, domains) - Output
AppConfigJSON array on stdout duringsetup
See examples/users.d/nethvoice in this PR for a complete reference implementation.
Add comprehensive developer reference for users.d/ plugins: --users-file and --instances-file JSON formats, module name matching, AppConfig output format, and example reference. Add shared plugin security model table covering ownership, permissions, environment, timeouts, and temp file handling for both diagnostics.d/ and users.d/ systems. Update credential lifecycle with database cleanup. Add remove_services command, users/ directory, and examples/ to project structure.
Support Service — Architecture
How it works
A tunnel client on the customer's system opens a persistent WebSocket to our support service. The connection is multiplexed with yamux — one WebSocket carries many parallel streams. When an operator clicks "Open" in the UI, traffic flows through the tunnel to reach the remote service (web UI, terminal, API) as if it were local.
graph LR subgraph Customer System TC[tunnel-client<br/>yamux mux] --> WU[Web UI] TC --> SA[SSH/API] TC --> ETC[...] end TC ---|WebSocket<br/>single connection| SS BR[Browser<br/>operator] --> NG[nginx<br/>proxy] NG --> BE[Backend :8080<br/>sessions, auth] BE --> SS[Support :8082<br/>tunnels, yamux]Session Lifecycle
stateDiagram-v2 [*] --> pending pending --> active : WebSocket established active --> closed : operator closes active --> grace_period : disconnect grace_period --> active : reconnect (same session) grace_period --> expired : timeout (30-60s)WebSocket + yamux Multiplexing
The tunnel client opens one WebSocket to the support service. On top of it, yamux creates a multiplexed session — like having many TCP connections inside a single one.
How it connects:
GET /support/api/tunnelwith HTTP Basic Authnet.Connyamux.Serveris created over the wrapped connection (keepalive 15s)Server-initiated streams: the support service can also open streams toward the tunnel-client. These start with a
COMMAND <version>\nheader and carry a JSON payload. The tunnel-client processes the command and respondsOK\norERROR <msg>\n.On disconnect: the tunnel enters a grace period (30-60s). If the client reconnects, the same session is reused. If the grace expires, the session is closed and ephemeral credentials are wiped from the database.
Ephemeral User Provisioning
When the tunnel-client connects, it provisions temporary support users on the managed system:
*)Lifecycle:
/var/run/my-support-users.json), reports them to the server via yamuxUSERS_REPORTstreamAfter user provisioning,
users.d/plugins configure applications to accept the support credentials (e.g., creating FreePBX admin entries for NethVoice).Static Service Injection
Operators can add arbitrary
host:portservices to a running tunnel without reconnection. This is useful for services not auto-discovered via Traefik — for example the web management interface of a device on the customer's LAN (IP phone, managed switch, NAS, etc.).Example: to access a Yealink phone's web UI at
192.168.1.100:443on a customer system, add a service withtarget: 192.168.1.100:443,tls: true. The phone's interface becomes available at:…as if the operator were on the same LAN as the phone.
How the UI Proxy works (subdomain)
When an operator clicks a service link (e.g. NethVoice UI), the browser opens a new tab on a dedicated subdomain. Each service gets its own origin, so all the app's absolute paths (
/_next/,/api/,/static/) work natively.The
?token=is removed from the URL after the first request (redirect), so it never leaks in logs, referrer headers, or browser history.How the Web Terminal works (xterm.js)
The terminal needs a WebSocket from the browser, but browsers can't send
Authorizationheaders on WebSocket connections. Solution: one-time ticket exchanged beforehand.The tunnel client spawns a PTY (pseudo-terminal) directly on the customer system — no SSH daemon involved. The PTY output is forwarded as raw bytes through the yamux stream back to the browser's xterm.js.
Why TCP hijacking instead of
httputil.ReverseProxy?httputil.ReverseProxycan't handle WebSocket upgrades. After the101 Switching Protocols, the connection becomes a raw bidirectional byte stream — not HTTP. The solution ishttp.Hijacker: take control of the raw TCP socket from Go's HTTP server. Two goroutines thenio.Copybytes in both directions (browser ↔ support service) with no HTTP overhead.Access Patterns & Auth
system_key:system_secret(SHA256), 3-tier cache (memory → Redis → DB), rate-limitedconnect:systemspermission, RBAC scope verified on all operationsGETDELon use → WebSocket via TCP hijack{session_id, service_name, org_role}→ SameSite=Strict cookie on subdomainINTERNAL_SECRETX-Session-Token(64-char hex, per-session) +INTERNAL_SECRET(required, fail-fast at startup)Security Highlights
GETDEL), JWT never touches the URL?token=, stored asHttpOnly SameSite=Strictcookie, URL cleaned via redirectcrypto/subtlefor all token validationsframe-ancestors 'self'on proxied responsesInter-service Communication
Components & Files
services/support/services/support/cmd/tunnel-client/backend/methods/support.go,support_proxy.gofrontend/src/components/support/proxy/nginx.confbackend/database/migrations/009_*,018_*–022_*support_sessions,support_access_logs, diagnostics, users columnsRelated Comments
diagnostics.d/andusers.d/— Developer reference for writing diagnostic and user configuration pluginsTesting Environment
To trigger a fresh deployment of all services in the PR preview environment, comment:
Automatic PR environments:
Merge Checklist
Code Quality:
Builds: